智能论文笔记

Discrete Morse Sandwich: Fast Computation of Persistence Diagrams for Scalar Data -- An Algorithm and A Benchmark

Pierre Guillou , Jules Vidal , Julien Tierny

分类：机器学习 | 计算机视觉

2022-06-27

本文介绍了用于持久图计算的有效算法，给定一个输入分段线性标量字段f在D上定义的d二维简单复杂k，并带有$ d \ leq 3 $。我们的方法通过引入三个主要加速度来扩展开创性的“ Paircells”算法。首先，我们在离散摩尔斯理论的设置中表达了该算法，该算法大大减少了要考虑的输入简单数量。其次，我们介绍了问题的分层方法，我们称之为“夹心”。具体而言，minima-saddle持久性对（$ d_0（f）$）和鞍 - 最大持久对（$ d_ {d-1}（f）$）是通过与Union-Find-Find-Find-Find-Find-Find-Find-Find-find-find-find-find-find-find-find-find-find-find-find-find-find of nourstable组的1个有效计算的。 - addles和（D-1）addles的稳定集。尺寸为0和（D-1）的快速处理进一步减少，并且大幅度降低了$ d_1（f）$，即三明治的中间层的计算$ d_1（f）$的关键简单数量。第三，我们通过共享记忆并行性记录了几个绩效改进。我们为可重复性目的提供了算法的开源实施。我们还贡献了一个可重复的基准软件包，该基准软件包利用了公共存储库中的三维数据，并将我们的算法与各种公开可用的实现进行了比较。广泛的实验表明，我们的算法提高了两个数量级，即它扩展的开创性“ Paircells”算法的时间性能。此外，它还改善了14种竞争方法的选择，改善了记忆足迹和时间性能，比最快的可用方法具有可观的增长，同时产生了严格的输出。我们通过应用于表面，音量数据和高维点云的持续性一维发电机的快速和稳健提取的应用来说明我们的贡献实用性。

translated by 谷歌翻译

Quantum-Inspired Tensor Neural Networks for Option Pricing

Raj G. Patel , Chia-Wei Hsing , Serkan Sahin , Samuel Palmer , Saeed S. Jahromi , Shivam Sharma , Tomas Dominguez , Kris Tziritas , Christophe Michel , Vincent Porte

分类：机器学习

2022-12-28

Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.

translated by 谷歌翻译

Langevin algorithms for very deep Neural Networks with application to image classification

Pierre Bras

分类：机器学习 | (统计)机器学习

2022-12-27

Training a very deep neural network is a challenging task, as the deeper a neural network is, the more non-linear it is. We compare the performances of various preconditioned Langevin algorithms with their non-Langevin counterparts for the training of neural networks of increasing depth. For shallow neural networks, Langevin algorithms do not lead to any improvement, however the deeper the network is and the greater are the gains provided by Langevin algorithms. Adding noise to the gradient descent allows to escape from local traps, which are more frequent for very deep neural networks. Following this heuristic we introduce a new Langevin algorithm called Layer Langevin, which consists in adding Langevin noise only to the weights associated to the deepest layers. We then prove the benefits of Langevin and Layer Langevin algorithms for the training of popular deep residual architectures for image classification.

translated by 谷歌翻译

Packing Privacy Budget Efficiently

Pierre Tholoniat , Kelly Kostopoulou , Mosharaf Chowdhury , Asaf Cidon , Roxana Geambasu , Mathias Lécuyer , Junfeng Yang

分类：机器学习

2022-12-26

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.

translated by 谷歌翻译

Adapting to game trees in zero-sum imperfect information games

Côme Fiegel , Pierre Ménard , Tadashi Kozuno , Rémi Munos , Vianney Perchet , Michal Valko

分类： (统计)机器学习 | 机器学习

2022-12-23

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.

translated by 谷歌翻译

Langevin algorithms for Markovian Neural Networks and Deep Stochastic control

Pierre Bras

分类：机器学习 | (统计)机器学习

2022-12-22

Stochastic Gradient Descent Langevin Dynamics (SGLD) algorithms, which add noise to the classic gradient descent, are known to improve the training of neural networks in some cases where the neural network is very deep. In this paper we study the possibilities of training acceleration for the numerical resolution of stochastic control problems through gradient descent, where the control is parametrized by a neural network. If the control is applied at many discretization times then solving the stochastic control problem reduces to minimizing the loss of a very deep neural network. We numerically show that Langevin algorithms improve the training on various stochastic control problems like hedging and resource management, and for different choices of gradient descent methods.

translated by 谷歌翻译

MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue

Nikita Moghe , Evgeniia Razumovskaia , Liane Guillou , Ivan Vulić , Anna Korhonen , Alexandra Birch

分类：自然语言处理

2022-12-20

Task-oriented dialogue (TOD) systems have been applied in a range of domains to support human users to achieve specific goals. Systems are typically constructed for a single domain or language and do not generalise well beyond this. Their extension to other languages in particular is restricted by the lack of available training data for many of the world's languages. To support work on Natural Language Understanding (NLU) in TOD across multiple languages and domains simultaneously, we constructed MULTI3NLU++, a multilingual, multi-intent, multi-domain dataset. MULTI3NLU++ extends the English-only NLU++ dataset to include manual translations into a range of high, medium and low resource languages (Spanish, Marathi, Turkish and Amharic), in two domains (banking and hotels). MULTI3NLU++ inherits the multi-intent property of NLU++, where an utterance may be labelled with multiple intents, providing a more realistic representation of a user's goals and aligning with the more complex tasks that commercial systems aim to model. We use MULTI3NLU++ to benchmark state-of-the-art multilingual language models as well as Machine Translation and Question Answering systems for the NLU task of intent detection for TOD systems in the multilingual setting. The results demonstrate the challenging nature of the dataset, particularly in the low-resource language setting.

translated by 谷歌翻译

Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation

Nuno M. Guerreiro , Pierre Colombo , Pablo Piantanida , André F. T. Martins

分类：自然语言处理 | 机器学习

2022-12-19

Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can unpredictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the problem of hallucination detection in NMT by following a simple intuition: as hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Experimental results show that our detector not only outperforms all previous model-based detectors, but is also competitive with detectors that employ large models trained on millions of samples.

translated by 谷歌翻译

Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data

Maxime Darrin , Pablo Piantanida , Pierre Colombo

分类：自然语言处理

2022-12-18

As more and more conversational and translation systems are deployed in production, it is essential to implement and to develop effective control mechanisms guaranteeing their proper functioning and security. An essential component to ensure safe system behavior is out-of-distribution (OOD) detection, which aims at detecting whether an input sample is statistically far from the training distribution. Although OOD detection is a widely covered topic in classification tasks, it has received much less attention in text generation. This paper addresses the problem of OOD detection for machine translation and dialog generation from an operational perspective. Our contributions include: (i) RAINPROOF a Relative informAItioN Projection ODD detection framework; and (ii) a more operational evaluation setting for OOD detection. Surprisingly, we find that OOD detection is not necessarily aligned with task-specific measures. The OOD detector may filter out samples that are well processed by the model and keep samples that are not, leading to weaker performance. Our results show that RAINPROOF breaks this curse and achieve good results in OOD detection while increasing performance.

translated by 谷歌翻译

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Mingda Chen , Paul-Ambroise Duquenne , Pierre Andrews , Justine Kao , Alexandre Mourachko , Holger Schwenk , Marta R. Costa-jussà

分类：自然语言处理

2022-12-16

End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions. The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASR-dependent metrics including ASR-SENTBLEU in all translation directions and ASR-COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics.

translated by 谷歌翻译